(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



CORRECTED VERSION 



(19) World Intellectual Property Organization 
Intematiooal Bureau 

(43) International Publication Date 
20 September 2001 (20.09:2001) 




HI 



(10) International Publication Number 

wo 01/68807 A2 



(51) IntcniatioDiil Patent Classification^: C12N 

(21) iDtemational Application Numben PCr/U501/08590 

(22) International FUing Date: 16 March 2001 (16.03.2001) 

(25) Filing Language: English 

(26) Publication Language: Engllish 



(30) Priority Data: 

60^190362 
60^272.847 



16 March 2000(16.03.2000) US 
1 March 2001 (01.03.2001) US 



(71) Applicant (for ail desired States except USJi FRED 
HUTCHINSON CANCER RESEARCH CENTER 
[USAJS]; Office of Technology Transfer. 1100 Fairvicw 
Avenue North, M/S: C2M 027, Seattle. WA 98109-1024 
(US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US ortfy)i VAN STEENSEL, 



Bas [NL/NL]; Nieuwegrachtje 1-3, NI^lOll VP Amster- 
dam (NL). HENIKOFF, Steven [US/US]; 4711 51sl Place 
SW.SeatUc.WA 98116 (US). 

(74) Agents: POOR, Brian, W. et al.; Townsend and Townscnd 
and Crew LIJ> Two Embarcadero Center, 8th Floor, Saa 
Francisco, CA 94111 (US). 

(81) Designated States (natioml)i AU, CA. JP, US. 

(84) Designated States (regional): European patent (AT, BE, 
CH, CY, DE, DK. ES, FI, FR. GB, GR, IE, IT, LU. MC. 
NL, FT, SE, TR). 

Published: 

— without international search report ami to be republished 
upon receipt of that report 

(48) Date of publication of thb corrected version: 

27 December 2001 

[Continued on next page} 



(54) Title: IDENTIFICATION OF W VUV DNA BINDING LOCI OF CHROMATIN PROflEINS USING A TETHERED NU- 
CLEOTIDE MODIHCATION ENZYME 



■sn 



00 
00 



o 





1&9 703 70 tSX 25Q3 2S77 ttat(«b Bon-l 
BStMSRaiQASiAMSATO^) 

(57) Abstract: A novel technique is provided, designated DamlD. for the i<tentification of DNA lod that interact in vivo with specific 
niKJear jHoteins in eukaryotes. By tethfring a DNA modification enzyme, in particular, E. coli DNA adenine methyl transferase 
(Dam), to a chromatin protein. The DNA modification enzyme (Dam) can be targeted in vivo to the nadve Innding lod of the 
protein, resulting in local DNA modificatioa Sites of DNA modification can subsequently be mapped using modificati(H>-5pecific 
restriction enzymes, annbodies, or DNA array methods. DNA Modification Identification (DamID) has potential for genome-wide 
mapping of in vrvo target binding sites of chromatin proteins in various eukaryotes. 
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28. The method of claim 27, wherein the label is chemiluminescent, an 
80 enzyme, a fluorophor, or a radioactive moiety. 

29. The method of claim 28, wherein the fluorescent label is fluorescein, 
phycoerythrin (PE), Cy3, Cy5, Cy7, Texas Red, allophycocyanin (APC), CyTAPC, Cascade 
Blue, or Cascade Yellow. 

30. The method of claim 25, wherein the bound antibody is detected by a 
85 labeled second antibody. 

31. The method of claim 24, wherein the array conqnises DNA, cDNA, 
DNA comprising substantially only chromatin binding regions, KNA, or RNA comprising 
substantially only protein binding regions. 
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1 0. The method of claim 9, wherein the antibody is polyclonal, 
monoclonal, a single chain antibody, a chimeric antibody, or an antigen binding firagment 
thereof. 

30 11. The mettiod of claim 9, wherein the antibody is labeled. 

12. The method of claim 1 1, wherein the label is chemilraninescent, an 
enzyme, a fhiorophor, or a radioactive moiety. 

13. The meOiod of claim 12, wherein the fluorescent label is fluorescein, 
phycoerythrin^E), Cy3, Cy5, Cy7, Texas Red, allophycocyanin (APC), CyZAPC, Cascade 

35 Blucv or Cascade Yellow. 

14. The method of claim 9, wherein the bound antibody is detected by a 
labeled second antibody. 

15. The method of claim 8, wherein the array comprises DNA, cDNA, 
DNA coiiq>rising substantially only chromatin binding regions, KNA, or RNA comprising 

40 substantially only protein binding regions. 

16. A method for producing a profile of chromatin protein loci for a cefl 
population of interest con^iising; • 

transfecting the cell population wifli a plurality of e3q)ression vectors enable 
of e?qpressing a plurality of chromatin protein-nucleotide modification enzyme fiision 
45 proteins, each expression vector comprising a nudeic acid encoding a low eflSde^ 
prombtCT operativelty associated t^th a nucleic add encodi^ 
nucleic add encoding a nucleotide modification enzyme; 

culturiiig the trariisfected cells for a period of time sufBdent for expression of 
and binding of each of flie pluraUty of chromatin protein-nucleotide modification enzyme 
50 fiision proteins; and 

detecting the led for each of the nucleotide modifications within the 
chromatin of the cell population; therefrom detedamung the profile of chromatin protein lod 
for the cell population. 
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Dam (p<0.003), but not for Dam-GAF (p=0.25). The somewhat higher noise level in the 
Dam-GAF data obtained in the assays may preclude detection of exclusion from HP 1 
binding sites. Finally, comparison of GAF andI>mSir2-l revealed a subset of genes that 
were associated with both proteins. Biochemical analysis may reveal whether the two 
proteins can be part of one protein complex, or whether they bound separately to different 
regions in the same genes. 

To confirm the relative distributions of the three proteins their 
immunocytochemical staining patterns were examined. As provided above in Example 1, 
HPl in Kc cells was associated with a large chromocenter. In contrast, the DmSir2-l -Dam 
fusion protein sheared to be associated with the euchromatic compartment, and was 
essentially excluded &om HPl-containing regions. Likewise, the GAF-Dam fusion protem 
was located in the euchromatin con^artment and mostly absent from the chromocenter. 
These cytological results were in agreement with the molecular mapping data, and confirmed 
that GAF and DmSir2-l were preferentially associated with non-heterochromatic regions. 

Although the foregoing invention has been described in some detail by way 
of illustration and example for purposes of clarity of imderstanding, it will be obvious that 
certain changes and modifications may be practiced within the scope of the appended claims. 
The scope of die invention should, therefiare, be deteimine>d not with reference to the above 
description, but instead should be detemoined with reference to the appended claims along 
with flieir fiill scope of equivalents. 

All publications and patent documents cited in this application are 
incorporated by rdFerence in their entirety for all purposes to the same extent as if each 
individual publication or patent document were so individually denoted. 
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N-teiminus of GAF. Again, results were reproducible, with r in pairwise comparisons of 
three expeiiinents rangmg from 0.81-0.93. Importantly, flie C- andN-tenninal fusion 
proteins gave similar results, r = 0.80, in two independent cpn:q)arisons), although Cy3;Cy5 
ratios with Dam-GAF cover a smaller dynamic range than with GAF-Dam. These results 
strongly suggest that Dam, when fused to either end of GAF, does not interfere with correct 
targeting of GAF. 

Genes that q^pear to strongly bind GAF have no common function or 
e3q>ression pattern. Because in vitro binding assays and in vrvo cross-linking studies have 
shown that GAF binds GA-rich regulatory elements (Biggin et al.. Cell 53 :699-71 1(1988); 
Soeller et aL, MoL Cell Biol 13:7961-7970 (1993); Strutt et aL, EMBO J, 16:3621-3632 
(1997); O'Brien et al., Gene^ Dev. 9:1098-11 10(1995)), the GAF target loci identified by the 
m^jping were investigated to determine whether they were enriched in such elements, 
hideed, lod that display moderate to strong GAF bindiiig have significantly higher average 
densities of GAGAG (SEQ ID NO: 4) and GAGAGAG (SEQ ID NO: 5) sequences than loci 
with low GAF binding (Fig. 5B) providing strong evidence that bona fide target loci of GAF 
were identified. 

Target loci of DmSii2-l : Finally, target lod ofaDrosophilahomolog of budding yeast Sir2 
were m^ed by the methods of the present invention. In S. cerevisiae^ Sii2 plays a role in 
silencing of genes in the silent mating-type lod, telomeric regions, and the iDNA locus 
(Guarente, GCTicyi>ev. 14:1021-1026 (2000); Gartenberg, Curr, Opin. Microbiol 3:132-137 
(200(^. htt Drosophila, five Sir2-Iike proteins have been predicted by sequence analysis 
(Fiye, Biochenu Biqphys, Res, Commuru 273:793-798 (2000)). Of these five, the Sii2-like 
protein that was found to be most closely related to cervisiae Sir2 was chosen. The 
selected- protein has been referred to herein as I>mSir2- 1 . The homology to yeast Sir2 
suggested tiiat DmSir2-l might be assodated with heterochromatia in I^osophila, but no 
e?q}eiimental studies of I>niSir2-l have been reported. 

Mapping results obtained with a DmSir2-l-Dam fusion protein are shown in 
Fig. 6. DmSir2-l demonstrated association with numerous genes, in a reproducible fashion 
(r=0.81 between two indepaident experiments). Among the strongest DmSir2-l binding lod 
were several eucbromatic, constitutively expressed genes such as genes encoding translation 
fiictors, putative ribosomal proteins, a-tubulin, hsc4 and EIP40. This.suggests that PmSii2- 
1 binds to active genes, tmUke yeast Sir2. 
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BioL 19:4366-4378 (1999)). A small number of pericentric target loci of HPl have been 
identified (van Steensel and Henikofl^ Nature Biotechnol 18:424-428 (2000)), but the nature 
of the HPl binding sites on the euchromatic arms is unknown. 

A scatter diagram of the hybridization signals measured for Cy3 (Dam-HPl) 
vs Cy5 (Dam) showed that the majority of cDNAs display an almost identical Cy3:Cy5 ratio 
{Le.y were located on a single diagonal in the scatter diag^ram), indicating no detectable 
association of the corresponding genes with HPL However, a distinct set of cDNAs 
demonstrated a clear offset from this diagonal towards higher Cy3:Cy5 ratios. These 
cDNAs must represent target loci of HPl . The absence of data points with lower Cy3 :Cy5 
ratios demonstrated that tethering of Dam to HPl caused an increase in methylation of HPl 
target loei> but not a decrease in methylation levels of non-target loci. 

The probed loci are represented in Fig. 4A on the standard polytene 
chromosome mq), showiag their relative HPl binding (i,a, Cy3:Cy5 ratios). Most lod 
display a constant Cy3:Cy5 ratio (^proximately 0.5-0.6), which was interpreted as non- 
targeted 'background' methylation. However, several lod demonstrated a considerably 
higher ratio, implying HPl binding. Although the cutoff between 'target* and 'non-target' 
Cy3:Cy5 ratios was arbitrary, it is important to note that the differences in Cy3:Cy5 ratios 
between probed lod were highly reprodudble. Pair-wise conq)arisoiis of three indq)endent 
e3q>erirnents showed correlation coefBdents between 0;95 and 0.99. Hence, lod that 
demonstrated oply a rmld increase in Cy3:Cy5 ratio over background levels (e.g., gene 
CG14967, Fig. 4A) were hkely to be assodated with HPl in vivo, although the local HPl 
concentration may be lower than at other target lod with higher Cy3 :Cy5 ratios. Moreover, 
diffoences in Gy3:Cy5 ratios between genes in the present assay may somewhat 
underestimate differences in protein binding. By Souttxem blot analysis it was found that the 
Ban-1 locus displays about 8-fold higjier HPl-targeted methylatipn thgn flie 5S iDNA locus 
(Exanq)le 1), yet the present micioarray analysis indicated only about a 4-fold difference. 
Such a microarray-specific con:9>ression effect has been observed previously (Pollack et aL, 
Nature Genet 23:41-46 (1999)). 

Among the target lod of HPl detected were genes located near pericentric 
heterochromatin, or on the largely heterochromiatic chromosome 4. Both the histone gene 
cluster (HisC) and tiie eta gene, located near the centromere on the left arm of chromosome 
2, were found to be assodated with HPl, in agreemejit with previous observations provided 
above in Example 1. In contrast, the j^b gene, which Ues between these two lod, showed 
no detectable HPl binding, suggesting a discontinuous distributipn of HPl in this region. 
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citrate, pH 7.0) onto poly-lysine coated nucroscope slides using an QmniG rid high- 
precision robotic gridder (GeaeMachines, San Carlo, CA). 

One ixg of purified methylated DNA was labeled with Cy3- or Cy5-dCTP 
(Amersham Pharmacia, Piscataway, N3) by random priming (Pollack et ai., Nature Genet, 
23:41-46 (1999)). Labeled experimental and reference DNA samples were mixed and 
hybridized to a microarray in 3X SSC in the presence of 20 ^g poly [dAdT], 100 jig yeast 
tRNA, and 25 \ig unlabeled I)pnl-digested plasmid encoding the fusion protein that was used 
for transfectiom Hybridization was performed at 63 °C for 1 6 hours followed by sequential 
washiiigs in IX SSC, 0.03% SDS, IX SSC, 0.2X SSC, and 0.05X SSC. Washed arrays were 
spun dry'in a centrifuge and immediately scanned using a GenePix 4000 fluorescent scanner 
(Akon Imtruments, Inc., Foster City, CA). 

Data analysis : hnage processing was performed using GenePix 3.0 image analysis software. 
Statistical analysis was performed using StatView software (Abacus Concepts, Be±eley, 
CA). Cy3:Cy5 ratios were normalized using Drosophila total genomic DNA (spotted 16 
tiine^ bri each rnicroarray sHde) as an internal standard. Thus, a Cy3:Cy5 value of 1 
represents the averagelevel of binding of the chromatin protein along the entire genome. 

Five ESTs initially thought to represent umque genes along the euchromatic 
arms (based on the available 5' sequence) were identified as HPl targets. Sequencing of 
both 5* and 3' ends indicated that these clones are hybrids of aunique gene and a repetitive 
sequence. It was presume lhat this was a cloning artifact that occurred during the 
constructiori of the CK library (Kopczynsld et al.. Proa Natl Acad. ScLUSA 95:9973-9978 
(1998)). These clohes are represented in the Dispersed Elements section of Figures 4 
tluDOUjg}i The &ct that these chimeric clones were identified as HPl targets underscores the 
seositiviiy of the assay of the present invention. 

Analysis ofthe density of(GA)n elements was carried out as follows. Fifteen 
lod showing low GAF-Dam binding (Cy3:Cy5 ratios 0.97 ± 0.09) and 15 loci showing high 
GAF^Dam binding (Cy3:Cy5 ratios 2.77 ± 0.65) were selected based on availability ofthe 
cort^lete probe sequence. Corresponding geuomic sequences wero obtained fiom the 
BDGP/Celera genomic database and covered the region encoding the cDNA fragment 
present on the rnicroarray. Any introns smaller than 5 Id), as weU as S^l^ 
i4)stream and downstream of the probed region were included in the analysis, because 
inethylation by te&ered Dam extends in cis over a few kbs. On average, about 7.5 kb was 
anafyzed per probed region. GAGAG (SEQ ID NO: 4) and GAGAGAG (SEQ ID NO: 5) 
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Z4yU, DO J J 


STSDm0328 


BDGP*' 


135,393 


28S(K) 


M21017 


5018,5492 


SamDC 


Y11216 


410, 961 


ci 


U66884 


12262,12865 


' (R) indicates that the probed 


region is predominantly present as a tandem repeat array 



^ cDNA sequence; the corresponding genomic sequence (GenBank accession AF21 1849) 
contains two approximately 50 bp introns 
^ Bericeley Drosophila Genome Project database 

EXAMPLE2 

The present example provides large-scale m^ing of in vivo binding sites of 
chromatin proteins^ using a combination of targeted DNA modification and microairay 
detection. Three distinct chromatin proteins in Drosophila Kc cells were m^ed and each 

: were foiind to associate with specific sets of genes. HPl was found, as above, binds 
predominantly to pericentric genes and transposable elements, GAGA factor associates with 
enchromatic genes that are enriched in (GA)n motrfe. Surprisingly, a Drosophila homolog of 

. yeast iftr2 was found to associate with several active genes and was excluded fix>m 
heterochromatin. 

The materials and methods used for microarray detection of modified DNA 
regions peripheral to the binding loci of the DNA binding protein were as follows. 

Plasmids : Vectors for expression of myc-tagged Dam and Dam-HP 1 were described above. 
A cDNA encoding full-length DmSii2-l (GenBank accession AF068758) was obtained by 
PGR anq;)lification from a Drosophila ovary cDNA library and cloned into piCMycDam as 
above, resulting in plasmid pSii2a-MDanL Sequencing of the cloned PGR product revealed 
that six nucleotides encoding Phe289 and Gto296 were missmg. The same polymorphisms 
were also present in a genomic sequence (Genbank AE003639). Dam was fused to the C- 
terminus of DniSir2-l because it had previoudy been found that addition of Green 
Fluorescent Protein to this end of yeast Sir2 did not interfere with correct subnuclear 
targeting of Sir2 (Operas et al., EMBO J, 19:2641-2651 (2000)). 

FuU lenjgth GAP (the 519 aniino add isofoim (Beay^'ati et al., 
i?as. 25:3345-3353 (1997), incorporated herein by reference) was amplified from plasmid 
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chromosome 2, has been speculated to have heterochromatic features (Fitch et al., 
Chromosoma 99:118-124 (1990)). The rDNA repeat has been mapped to the 
heterochromatic part of the X chromosome (Hilliker et aL, Cell 21 :607-619 (1980)), yet 
during interphase it is packaged inside the transcriptionally active nucleolus (Scheer et al,, 
5 OpiTL Cell Biol 1 1 :385^390 (1999)). Somewhat surprisingly, it was found that both loci 
(three different regions m HisC, and the 28S gene in the rDNA repeat) displayed Dam- 
EPl/Dam methylation ratios that were intermediate between euchromatin and 
heterochromatin (Fig. 3G). Since both lod are tandem repeats, it is possible that only a 
fraction of the repeats was associated with HPl. Alternatively, the association of HPl with 

10 these lod may be cell cycle regulated A similar level of Dam-HPl targeted methylation 
was observed for sequence tag STS Dm0328, which is located in the banded region proximal 
to HisC. Possibly, HPl 'spreads' fiona pericentric heterochromatin into the flanking 
euchromatin to include HisG. 

Finallyj the cufrihAS ihterruptus (ci) and S^idenosyl decarboxylase (SamDC) 

15 genes were tested. These genes are located in the banded part of chronaosome 4 and in 

region 31 on chromosome 2, respective^. Both regions are decorated by antibodies against 
HPl (James et al., Eur, J, Cell Biol 50:170-180 (1989)), ci showed levels of HPl-targeted 
methylation ibat were lower than in heterochromatic lod, but significantly higher than in 
euchromatic lod (Fig. 3B and Fig. 30), indicating that HPl is associated with this gene. In 

20 contrast, SamDC showed low levels of targeted methylation, suggesting that this gene was 
not abundantly associated with HPl. A detailed m^ of HPl assodations can be obtained in 
the future by systematic analysis of a large number of sequjsnces throughout the genome. 

Discussion 

25 The data provided herein demoristrate that Damn> can be used to identify 

sequences that interact m vrvo with specific proteins. Targeting of Dam leads to up to an 
q;yproximately 10-fold enrichment of methylation in the vicinity of binding sites of the Dam 
fusion partner, which is suffidoit for positive identification of most target sequences, and 
for detecting quantitative diCTerences in protein-target interactions. The "background' 

30 methylation throughout the genome by Dam fiision proteins was attributed to flie intrinsic 
DNA binding activity of Dam, which would conopete with the sequence^ or locus-specific 
interactions of its fusion partner. However, it is also possible that chromatin proteins are 
rather promiscuous in their interactions in vivo. 
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euchromalic loci. It has been thought that HPl is recruited to specific genomic loci by other 
chromatin proteins (Platero et aL, EMBO J. 14:3977-3986 (1995)). To identify HPl target 
loci, a myc-epitope tagged Dam-HPl fusion protein construct driven by a heat-shock 
promoter (Fig. 2)^vas transfected into Drosophila Kc cells, and the resultmg methylation 
5 patterns were m^ed. As a control, myc-tagged Dam (Dam-myc) was expressed. 

In order to demonstrate that fusion to Dam did not impair correct targeting of 
HPl, advantage was taken of the observation fliat HPl in Kc cells was predominantly 
located in a large discrete compartment in the nucleus. This conq)artment rq3resented 
clustered pericentric heterochromatin of all chromosomes, since all centromeres were 
10 generally located within this con5)artment In agreement with this, AT-richheterochromatic 
satellite repeats, which were visible as DAPI-bright regions, w^ closely associated with the 
HPl conq)artment HPl was also seen in a few small brightly stained dots scattered 
throughout the nucleoplasm. 

After heat shock induction, the Dam-HPl fusion protein (detected with an 
1 5 antibody against the myc epitope) showed a subnuclear distribution pattern that strongly 

resembled that of eodpgenous HPl. Both the large conq)artment, closely associated with the 
DAPI-bright regions, and a few small dots scattered throughput the nucleus were observeA 
This indicated that the fusion protemt was correctly targeted to natural HPl binding sites. In 
contrast, after heat-shock induction the Dam-myc protehi showed a very weak staining 
20 throughout the ceD, with no indication of subnuclear targeting. In the absence of heat shock 
induction the Dam-HP 1 or Dam-myc proteins were not detectable by immunofluorescence, 
indicating very low expression levels under those conditions. 

Whellier expression of Dsm-HPl leads to preferential methylation of 
heterochromatic DNA was also tested. Sites of methylation were visualized in situ using a 
25 rabbit antiserum (Bringmann et al., FEBSLetL 213:309-315 (1987)) against methyl-N^- 
adenine(^A). After heat-shock inductioii, cells transfected with either Dam or Dam-HPl 
showed strong °*A staining throughout the nucleus. Cells traiisfected with an empty vector 
showed no nuclear staining. Strildnigly, in the absence of heat shock induction, cells 
transfected with Dam-HPl generally showed staining of the large, discrete nuclear 
30 compartment associated witii DAPI-bright regions. Co-staining of "^A with an antibody 
against endogenous HPl confirmed that methylation was mostly restricted to the 
heterochromatic compartment An imprecise correspondence ofthe two staining patterns 
was expected because of a lack of GATCs in the sinq[>le satellites in Drosophila 
heterochromatin. In metaphase chromosomes of cells transfected with Dam-HPl, *^A was 
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the resulting DNA preparation was incubated 16 hours at 37 "^C with or without 2 units 
DpnU. After heat-inactivation at 80 °C, samples were diluted 1 :10 or 1 :100 and assayed by 
TagMm quantitative PGR (Li et al., Curr, Opin. Biotechnol 9:43-48 (1998) on an ABI77()0 
Sequence Detection System (PE Bio^stems, Foster City, CA) according to the 
5 manufkcturer's recommendations; Fluorogenic oligonucleotides were obtained from 

Synthegea (Houston, TX). A standard dilution series of genomic DNA from w; EP(2j0750 
flies was included in evay experiment to allow relative quantitation of each sample. PGR 
primers were chosen to flank one single GATC. 

10 Results 

in vrvb targeting of Dam to a specific DNA sequence : It has been demonstrated that a DNA 
cytosine niethyltransferase can be targeted in vitro to a specific DNA sequence by tethering 
it to a DNA4)inding protein CXu et aL, Nat. Genet 17:376-378 (1997)). A snnilar approach 
was tested to demonstrate whether it could be used to target E, coll DNA adenine methyl 

1 5 transferase to a spiedfic DNA locus in vivo in D, melanogaster. DNA adenine methyl 

transferase inethylates the N*-position of adenine in the nucleotide sequence GATC, which 
occurs on average every 200-300 bp in the fly genLome. DNA adenine methyl transferase 
(DAM) was chosen because endogenous methylation of adenine does not occur in DNA of 
most eukaryotes. Moreover, Dam is active when expressed in yeast (Gottschling, iVoc 

20 Natl AcdcL ScL USA 89:4062-4065 (1992); Singh et aL, Genes Dev. 6:186-196 (1992); 
Kladde et aL, iVc>a Natl. Acad ScL 91:1361-1365 (1994)) mdDrosophila (Wmes et 
al.. Chromosome 104:332-340 (1996)) and has no detectable efTects oaDrosqphila 
development or viability (Wines et aL, Chromosoma 1 04:332-340 (1996)), in contrast to 
certain cytosine mefliyltransferases O^yko et aL, Nat Genet 23:363-366 (1999)). 

25 : The weU-chanu5terized budding yeast protein GAI4 (Fischer etal.,^^ 

332:853-856 (1988)) was chosen as a DNA targeting proteuL The fly Une (GALDaml) was 
established to expresses a fusion protein (GAIDam) consisting of full-lengtii Dam and the 
DNA-binding domain of GAL4 (GAL4dbd; Fig* 1 A). A binding sequence for GAL4 was 
introduced by crossing GALDaml flies to line EP(2)0750, which carries a P-element with 

30 14 tandem bindiig sites for GAM (UASi4)(Rorth,/VoaW^^^ 

12422 (1996)) inserted into a sequenced region ofchiomosome 2 (Fig IB). As a control, 
EP(2)0750 was crossed to the fly line Me4, ^ch expresses Dam alone (Wines et al., 
Chromosoma 104:332-340 (1996)). The progenies of these crosses were used to test 
whether GAL4dbd was able to target Dam to GATCs in the vicinity of the UAS 14 array. 
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In titiis example a chromatin proteiii fusion protein with E. coli DNA adenine 
methyl transferase linked with Heterochromatin protein T (HPl) was used to identify DNA 
5 loci that interact with HPl in D. m^ZflwogoyfcTi 

Expression vectors : The Dam open reading fiame was aiiq>lified by PGR fiom plasmid 
YCpGAL-EDAM (Wines et al,, Chromosoma 104:332-340 (1996)) and cloned into 
pCaSpeR-hs followed or preceded by a linker oligonucleotide encoding the myc-epitope tag 

10 GluGlnLysneSeiGluGluAspLeu (SEQ ID NO: !)• Resulting in vectors pNDamMyc and 
pCMycDam, respectively. Vector pNDamMyc carries a stop codon 15 arqino add residues 
aftor the OTyc-tag, and was used to express the Dam-myc protein. A. fiagment encoding 
amino acid residues 1-145 of GAL4 was amplified by PGR from plasmid pSPGALl-145 
(provided by S. M. Parkhurst, Fred Hutchinson Cancer Research Center, Seattle, WA) and 

1 5 cloned in-frame into vector pCMycDam, resulting in plasniid pGALDam. The full-length 
ORF of A melanogasterEPl was amplified by PCR from plasmid pTH5 (Eissenberg et al., 
Proc Natl Acad. Sci, USA 87:9923-9927 (1990), incorporated herein by reference) and 
cloned in-frame into pNDamMyc, resulting in plasmid pDamHPL 

20 Cell culture and immunocvtochemistrv : Kcl67 cell culture and transfections were performed 
as described (HenikoflF et al., Proc, Natl Acad ScL USA 97:716-721 (2000), incorporated 
herein by reference). In some in situ staining e3q)eriments, cells were heat-shocked for 2 
hours at 37 "^C, followed by 5 hours recovery at 25 °C prior to fixation. In situ staining of 
proteins was carried out as desoibed (van Steensel et al., J, Cell ScL 108:3003-3011 (1995), 

25 jncorporated herein by reference) with C1A9 antibody agamst HPl ^ames et al., Eur. J, 
Cell Biol 50:170-180 (1989), incorporatedhereinbyrefereiice), a rabbit antiserum against 
Cid (Henikofif et al., Froc. Natl Acad, ScL USA 97:716-721 (2000), incorporated herein by 
reference) or monoclonal antibody 9E10 against the myc-epitope tag (Santa Cruz 
Biotechnology, Santa Cruz, CA). For in situ detection of "**A in interphase cells, transfected 

30 Kc cells were grown on glass coverslipSj fixed in methanol/acetic ^id (3:1) for 10 minutes, 
'washed in 70% ethanol followedby 2X BSC, denatured m 70% formamide in 2X SSC at 
80**C for 10 ininutes, washed in phosphate buffered saline, and stain^ wifli antibody RI 280 
(Bringmann and Luhimann, FEES Lett. 213:309-315 (1987), incorporated herein by 
reference) following the same procedure as for proteins. Mitotic spreads were prepared by 
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different chromatin proteins in the same cell type can reveal functional interactions (or lack 
tiiereof) between these proteins. At the level of an organism, the profiles can be used to 
compare the profiles between dififorent organisms or between diff^^ 
developmental stages) of an organism. The power of the present ^proach is illustrated by 
5 comparative profiling of HPl and DmSiT2-l, which indicated that DniSir2-l was not a 

heterochromattn protein. 

Chroinatin profiling can become a powerfiil tool in the analysis of ceUu^ 
differentiation. It is antidpated that chroinatin profiles made available using the methods 
disclosed herein for many proteins will be unique for specific cell types. Systematic 
1 0 mapping of such profiles can provide fimdamental new insigjits into the mechanisms of 
^ ceUular differentiation and transformation to a maHgnant condition. The methods as 

disclosed herein based on chromatin protein targeting of a nucleotide modification enzyme 
can be particularly usefiil in mammalian cells, in which other global mapping q)proaches 
based on chromatin immunoprecipitation methods (Blat and Kleckner, Cell 98:249-259 
15 (1999)) may fail due to the high complexity of the genome aud insufBcient specificity of 
antibodies. 

Moreover, in analogy to mKNA expresaon profiles^ 
286:531-537 (1999); Ross et al.. Nature Genet 24;227-235 (2000)), chromatin profiles can 
be used in studies of ceUular pathology. One important apphcation can be in the discov^y 

20 and prediction of cancer types. Different classes of tumor cells are likely to display distinct 
chromatin profiles, and tiiese profiles may therefore have high analytic and diagnostic value. 
The wide variety of chromatin proteins can allow a muchmore detailed and robust 
classification of cancer types than expressioii profiling, \^ch relies on only one data set 
{Le.y mRNA abundances) per cell type. 

25 Methods of the present invention can also be used to provide chromatin 

^ profiles of individuals witti immune defideney or auto immune conditions as well as 

examining chromatin changes in reaction to various dnjgs and ottier agents^ hi addition 
chromatin binding profiles can be constructed for respons 

organisms and expression profiles can be constructed for any transcription &cXot of other 
30 regulatory molecule or agent 

In another embodiment of the inventioii, the desacibed methods can be 
appfied to obtaining a methylation profile. Wthin this method and unlike chroinatin 
profiling which requires the introduction of a fiision protein, genomic DNA is obtained from 
a cell, tissue or organism of interest and firom a control cell, tissue, or organism of interest 
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fragments smaller than about 2.5 kb are typically added to a test array. Arrays useful in the 
present invention include, but are not limited to cDNA, DNA, DNA selected to contain 
primarily chromatin binding regions or protein binding regions, and the Uke. Each sairq)le of 
methylated and control fractionated DNA can be labeled with, for example, a different 
5 fluorescent label. The labeled samples are mixed and ^lied to the array under condition 
conducive for hybridization usiiig methods well known in the art The arrays are scanned for 
the detection of the two labels and the loci recognized by the chromatin protein can be 
mqjped. 

Additional methods for the purification of niethylated DNA regions, which 

10 can be appUed separately or used in various combinations in order to fiuiher increase the 
, purity of the isolated methylated regions include the following: 

Methylated DNA firagments can be affinity purified using antibodies against 
^A Monoclonal antibody (for exanq)le, clone PI A8) which ^ecifically recognize methyl- 
6-adenine (^A) have been generated using a procedure previously described (Bringmaim et 

15 A., FEES Lett, 213:309-315 (1987)). The antibody obtain^ can be used in conjunction with 
the restriction endonuclease i:!p7iI to afifinity purify methylated First, 
purified genomic DNA is digested withX^p/iI, which results into exposure of "^A at the blunt 
ends of the digestion products. Antibody was allowed to bind to the exposed "^A 
Antibody-DNA complexes were then isolated using (for example) protein A - sepharose 

20 beads (Amersham) pre-coated with rabbit-anti-^QUse antibody. After purification, 

methylated DNA firagments wore eluted fmm the antibody by incubation with 20 mM free 
methyl-6-adenosine. 

Further, methylation-specific PGR an^Iification has been used to isolate . 
methylated DNA fragments. After digestion with Dpnly an excess of double-stranded 

25 adqjtor ohgonucleotide (with non-phosphorylated 5* ends to prevent self-ligation of the 
oligonucleotide) were ligated to the exposed blunt DNA ends using T4 DNA hgase. 
Because Dpnl cuts only methylated GATC sequences^ the adaptor only ligated to methylated 
DNA ends. The ligated firagments were specifically amplified by PGR using a primer 
conq>lementary to the adaptor sequence. The specificity pfthis procedure can be finther 

30 enhanced using either of two modifications. 

In one modification, prior to theXjpnl digestion, genomic DNA is treated 
with a DNA phosphatase such as alkaline phosphatase. This prevents ligation of the adaptor 
to DNA ends that were not the product of DprH digestion (e.g., DNA breaks resulting &om 
mechanical shearing rcontaniinant endonuclease activity diiring the pinification of 
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distance of the loci recognized by the chromatin protein. It is important that the 
modification of a sufficient number of nucleotide residues to provide a detectable signal is 
not toxic to the cells, tissues or organism being tested. Therefore, as above, promoters which 
provide for low levels of expression are used and nucleotide modification enzymes which 
^ 5 provide non-toxic nucleotide modifications are used. 

Detection of Chromatin Binding Sites : 

Several methods are available for the detection of modified nucleotides in the 
vicinity of the binding* lod recognized by the chromatin protein. These include, but are not 
10 limited to, immunohistochemistry, Southern blot analysis, PGR analysis and array (i.c., 
macro- and micro-array) analysis. 

In a typical embodiment, cells are grown or collected on a solid phase 
a5)propriate for microscopy. For example, transformed cells can be cultured on a glass 
nucTOscope cover stip. . The cells are then fixed and washed An antibc^ 
15 nucleotide modification carried out by the nucleotide modification enzyme of the fusion 
protefai is added. The antibody can be eiflier polyclonal antisera or a monoclonal antibody. 
Antibody can be ld)eled directly or a second Idjeled antibody can be us^ 
nucleotide modificatiort Following an incubation period the cells are washed and the 
antibody is detected providing a location within the nucleiis wh^e the chroi^ 
20 complexes within the chromatin. In one particular embodiment the cells are prepared as 
mitotic spreads by methods well known to the skilled artisan. 

A wide variety of labels can be employed for detection of the nucleotide 
modification. For example, the label can be, chemiluminescent, enzyme, fhiorophor, or a 
^' radioactive moiety, and the like. Typically, fluorescent labels, such as, fluorescein, 

25 phycoeryflnin (PE), Cy3, CyS, Cy7, Texas Red, allophycocyanin (APC), Cy7APC, Cascade 
Bhie, Cascade Yellow, and the like, can be used. Methods; for labeling antibodies are well 
known to the skilled artisan. . 

In still another embodiment Southern blot can be used to map the region of 
^ the chromatin where a nucleotide modification has occurred. Typically, genomic DNA is 

30 isolated 6om apopulatipn of cells transformed wifli the vector capable of expressing the 
chromatin binding protein-nucleotide modification enzyme fusion polypeptide by methods 
well known to the skilled artisaiL Thepopulationof cells useful in the naethods of the 
present invention can be isolated fiom cells grown in vitro, isolated firom a single tissue, or 
isolated fix)m a multicellular organism. 
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Nucleotide modifying enzymes, fragments, derivatives and ai^ 
useful in the present invention are those which can modify one or more nucleotides in a 
nucleic add sequence, such as an RNA, DNA, or the like, under conditions found in a live 
cell and ma manner which is detectable. The enzyme must also modify the nucleotides in a 

5 maimer which is not toxic to the celL In other words, the cell or organism must be able to 
continue to proliferate and differentiate ia a normal manner. For the modification to be 
detectable, an enzyme is selected which modifies the nucleotide ia a manner which is not 
typical of a modification commonly found in the cell being assayed. For instance, in 
eukaryotic cells it is typical to select as the modification enzyme, for exanq)le, DNA adenine 

10 metiiyl transferase because methylation of adenine is not conunon in eukaryotic cells. 
Additional nucleotide modification enzymes useful in the present invention include, for 
exairq)le, but are not Umited to, adenine methyltransferases, cytosine methyltransforases, 
thymidine hydroxylases, hydroxymethyluracil p-glucosyl transferases, adenosine 
deaminases, and the like. However, as described in more detail below, within one 

15 embodiment, a modification of flie method of the present invention reUes on an endogenous 
modification enzyme to modify DNA in a cell, the sites of such modifications are then 
determined by a variety of detection means; including the use of nucleic acid arrays. 

M the methods of flie pr^ent inventioii, the DNA modification enzyme, 
fira^ent, derivative, or analog thiereof, is targeted to the loci associated with the bindiiig of 

20 the chromatin protein by the chromatin ptotein, fragment, derivative or analog thereof as a 
fusion protein. Typically, the polypeptides which comprise the chromatin protein and the 
DNA modification enzyme are separated from one another by one or more amino acid 
residues which comprise a linker sequence. The linker can be bom about 1 to about 1000 
amino add residues, or more. Typically, the linker sequence is fcom about 3 to about 300 

25 airrino acid residues* The amino acid sequence can be from another polypeptide or can be an 
artificial sequence of amino acid residues, such as; for example, Gly and Ser residues which 
provide a flexible linear amino acid sequence allowing the amino acid sequences for the 
chromatin polypeptide and the nucleotide inodification eniyme to fold into an active 
configuratioiiL In a particular embodiinent of the present invention a linker peptide 

30 comprising the myc-epitope tag GluGlnLysHeSerGluGluAspLeu (SEQ ID NO: 1) was 

inserted between the chromatin polypeptide and the nucleotide modification enzyme DNA 
adenine inetiiyl transferase. 
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predommantly to pericentric genes and transposable elements, GAGA factor (GF) which 
associates with euchromatic genes that are enriched in (GA)d motifs, or a Drosophila 
homolog of the yeast Sir2 gene {DmSirl-l) which associates with certain active genes can be 
used to construct a fiision protein of the invention- Fragments, derivatives or analogs of a 
5 chromatin protein or protein complex can be tested for the desned activity by procedures 
known in the art, including but not limited to flie functional assays to determine whether the 
fragment recognizes and binds the target loci or nucleotide sequence recognized by the 
native fiill length chromatin binding protein. The afBnity or avidity of the binding to the 
target loci or nucleotide sequence can be the same, less or greater than flie affinity or avidity 
10 of the native fiiU lengfl^ protein. It is only necessary that the fragment, derivative or analog 
recognize and bind the target loci or sequence, fri addition, Uie chromatin polypeptide 
fragment, derivative, or analog can be tested for the desired activity in the fiision protein to 
ensure localization to the impropriate loci. 

Polypeptide derivatives include rmtundly-occutring amino acid sequence 
15 variants as well as those altered by substitution, addition or deletion of one or more amino 
acid residues that provide for functionally active molecules. Polypeptide derivatives inchide, 
but are not limited to, those containing as aprimaiy amino acid sequence all or part of the 
amino acid sequence of a native chromatin pol>peptide including altered sequences in which 
one or more functionally equivalent amino acid residues (eg^., a conservative substitution) 
20 are substituted for residues within the sequence, resulting in a silent change. 

In another aspect, polypeptides of the present invention include those peptides 
having one or more consensus amino acid sequences shared by all members of the chromatin 
protem femily members, but not found in other proteins. Database analysis indicates that 
these consensus sequences are not found in other polyp^tides, and therefore this 
25 evolutioiiary conservation reflects the niiclwtidetarg€ft binding 

chromatin polypeptides. Chromatin polypeptide family members, including fragments, 
derivatives and/or analogs comprising one or more of these consensus sequences, are also 
within the scope of tiie invention. 

In another aspect, a polypeptide consisting of or comprising a fragment of a 
30 chromatin polypeptide having at least 5 contiguous amino adds of the chromatin 

polypeptide which recognize the specific target nucleotide 3equence is provided. In other 
embodiments, tiie firagment consists of at least 20 or 50 contiguous ai^ 
chromatin polypeptide.. In a specific embodiment, the fragments are not larger than 35, 100 
or even 200 amino adds. 
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aHgned with a word of the same length in a database sequence. T is referred to as the 
ndghborhood word score threshold (Altschulc/ a/. (1990), ^ra). These initial 
neighborhood word hits act as seeds for initiating searches to find longer HSPs containing 
them. The word hits are then extended in both directions along each sequence for as far as 

5 the cumidative alignment score can be increased. Extension of the word hits in each 

direction is halted when: the cumulative ahgnment score fells off by the quantity X &om its 
maximum achieved value; the cuniulative score goes to zero or below, due to the 
accumulation of one or more negative-scoring residue ahgnments; or the end of either 
sequence is reached The BLAST algorithm parameters W, T, and X determine the 
-\ 10 sensitivity and speed of the ahgnment The BLAST program iises as defaults a wo 

(W) of 11, the BLOSUM62 scoring matrix (see Henikoff & Henikof^ Proa Natl Acad. ScL 
CASii 89:10915-9 (1992), which is mcoiporated by reference here 
expectation (E) of 10, M=5, N==-4, and a comparison of both strands. 

In addition to calculating percent sequence identity, the BLAST algorithm 

15 also perfomxs a statistical analysis of the similarity between two sequences (see^ eg., Karlin 
& Altschul, Proa Natl Acad. Sci. USA 90:5873-77 (1993), which is incorporated by 
reference herein). One measure of similarity provided by the BLAST algorithm is the 
smallest sum probability (P(N)X which provides an indication of the pxobabiUty by which a 
xnatch between two nucleotide or amino acid sequences would occur by chance. For 

20 example, a nucleic add is considered similar to a ref^ence sequence if the smallest sum 
probability in a conq)arison of the test nucleic add to the reference nucldc acid is less than 
about 0.1, more typically less than about 0.01, and most typically less than about O.OOL 
Further, a polypq>tide is typicaUy substantiaUy identical to a second polypeptide, 
exaiE^le, where the two peptides differ only by conservative substitutions. 

25 The terms '^ansformation" or . 'Iransfection" means a process of stably or 

transiently altering the genotype of a recipient ceU or nncrporganism by the introduc^^ 
polynucleotides. This is typically detected by a change hi the phenotype of the recipient cell 
or organism. The term '^transformation*' is generally appUed to microorganisms, while 
*transfection*' is used to describe Qiis process in cells derived fiom multicellular organisms. 

30 Generally, other nomenclature used herein and many of the laboratory 

procedures in cell culture, molecular genetics and nucleic acid chemistry and hybridization, 
which are described below, are those well known and commonly enq>loyed in the art (See 
generally Ausubel et al (1996) siq)ra; Sambrook et al^ Molecular Cloning: A Laboratory 
ManudJy Second Edition, Gold Spring Haibor Laboratory Press, New Yoric (1989), which are 
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The term "substantial similarity" in the context of polypeptide sequences, 
indicateis that the polypeptide comprises a sequence with at least 70% sequence identity to a 
reference sequence, or preferably 80%, or more preferably 85% sequence identity to the 
reference sequence, or most preferably 90% idmtity over a comparison Avindow of about 10- 

5 20 amino acid residues. In the context of amino acid sequences, "substantial similarity** 
further includes conservative substitutions of amino acids. Thus, a polypeptide is 
substantially similar to a second polypq>tide, for example, where the two pq)tides diSer 
only by one or more conservative substitutions. . 

The term "conservative substitution," when describing a polypeptide, refers to 

10 a change in the amino acid composition of the polypeptide tiiat does not substantially alter 
the polypeptide's totivity. Thus, a "conservative substitution" of a particular amino add 
sequence refers to substitution of those aroino acids that are not c^^ 
activity or substitution of amino acids with other amino acids having similar properties (eg., 
acidic, basic, positively or negatively charged, polar or non-polar, and the like) such that the 

15 substitution of even critical amino acids does not substantiany alter activity^ Conservative 
substitution tables providing functionally similar anuno. acids are well known in the art For 
example, the following six groins eadi contain amino acids that are conservative 
substitutions for one another 1) alanine (A), serine (S), threonine (T); 2) aspartic add (D), 
ghitamic add (E); 3) asparagine (N), ghitamine (Q)'^ 4) arginine (R)i lysine (K); 5) isoleucine 

20 (0, leudne (L), methionine (M), valine (V); and 6) phenylalanine (F), tyrosine (Y), 

tryptophan (W). {See also Crdghton, Proteins^ W. H. Freeman and Company (1984).) In 
addition, individual substitutions, deletions or additions that alter, add or delete a single 
amino add or a small percentage of amino adds in an encoded sequence are also 
"conservative substitutions." 

25 For sequence comparison, t}^icaUy one sequence acts as a reference 

sequence, to whidi test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are iiq)ut into a computer, subsequence coordinates 
are designated, if necessary, and sequence algorithm program parameters are designated. 
The sequence coiiqparison algorithm then calculates the percent sequence identity for tiie test 

30 sequeDce(s) relative to the reference sequence, based on fliededgnated program parameters. 

Optinial aligtnnent of sequmces for comparison can be conducted, for 
example, by the local homology algorithm of Smth & Waten^^ 

(1981), \^ch is incorporated by reference herein), by the homology ahgmnent algoriflun of 
Needleman & Wunsch (/ Mol Biol 48:443^53 (1970), which is incorporated by reference 
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"Chromatin protein" includes, but is not limited to histones, transcriptional 
factors, centromexe proteins, heterochromatin proteins, euchromatin proteins, condensins, 
cohesins, origin recognition complexes, histone kinases, dephosphorylases, 
acetyltransferases, deacetylases, methyltransferases, demethylases, and other enzymes that 
5 covalently modify histone, DNA repair proteins, proteins involved in DNA replication, 
proteins involved in transcription, proteins part of dosage cornpensation complexes and X* 
chromosome inactivation, proteins that are part of chromatm remodeling complexes, 
telomeric proteins, and the like. 

"Chromatin protein-raizyme fusion polypieptide" refers to a polypeptide 

10 encoded by a polynucleotide encoding the chromatin protein operatively associated with a 
polynucleotide which encodes a nucleotide modification enzyme. Also encompassed within 
this definition are polynucleotides which encode a functionally active fragment, derivative or 
analog of the chromatin protein or nucleotide modification enzyme. The term ^^polypeptide" 
refers to a polymer of amino adds and its equivalent and does not refer to a specific length 

1 5 of the product; thus, peptides, oUgopeptides and proteins are included within the definition 
of a polypeptide. A "fragment** refers to a portion of a polypeptide haviiig typically at least 
10 contiguous amino acids, more typically at least 20,'Still more typically at least 50 
contiguous amino acids of the chromatin protein. A "derivative" is a polypeptide which is 
identical or shares a defined percent identity with the wild-type chromatin protein or 

20 nucleotide modification enzyme. The derivative can have conservative amino acid 

isubstitutions, as compared with another sequence. Derivatives further include, for example, 
glycosylations, acetylations, phosphorylations, and the like. Further included within the 
definition of '^polypeptide" are, for exanq>le, polypeptides containing one or more analogs of 
an ainino add unxiatural amino adds, a&d the like), polyp 

25 linkages as well as other modifications known in the art, both naturally and non-naturaUy 
occurring. Ordinarily, such polypeptides will be at least about 50% identical to the native 
chromatin bindiog proteiri or nucleotide modification enzyme add sequence, typically in 
excess of about 90%, and more typically at least about 95% identical The polypqptide can 
also be substantially identical as long as the fragment^ derivative or analog displays similar 

30 functional activity and spedfidty as the wild-type chromatin protein or nucleotide 
modification enzyme. 

The tenns "amino add" or "axnino add residue", as used herein, refer to 
iiaturaUyoccxirringLarnino acids or to D amino adds ^ The 
commonly used one- and three-letter abbreviations for amino adds are used herein {see, 
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methylation in Dam-HPl transfected cells and Dam-myc transfected cells, calculated jftom 
the data in Fig. 3B. Shading is the same as the boxes in Fig. 3A. Bullets indicate ratios that 
are significantty different (one, p<0.05; two, p<0.01; three, pO.OOl according to the Mann- 
Whitney U-test) from the pooled ratios of the four heterochromatiG loci (black bullets) or the 

5 five euchromaHc loci (white bullets). Error bars represent standard deviations. The number 
of observations is indicated in parentheses. 

Fig. 4A and Fig. 4B depict mspping of HPl target loci. Fig. 4A demonstrates 
a chromosoinal map of Cy3 :Cy5 ratios (representative eoqperiment). Probed loci are 
indicated by their approxunate position on the cytogenetic m^. Centromeres are indicated 

10 by ovals. The large heterochromatic proximal region of the X chromosoine is depicted as a 
rectangle to the left of the centromere (not to scale). Some genes with jelatively high levels 
of HPl binding are labeled Fig. 4B depicts dispersed repetitive elements (mostly 
transposons). 

Fig. 5 A through Fig. 5C depict the mapping of GAF target loci. Fig 5 A 
15 provides a chromosomal m^^) of Cy3:Cy5 ratios (average of two experiments) using a GAF- 
Dam fusion protem. Some genes with relatively high levels of GAF binding are labeled. 
Fig 5B depicts dispersed repetitive elements (mostly transposons). Fig. 5C depicts a box 
plot showing the relative abundances of GAGAG (SEQ ID NO: 4) and GAGAGAG (SEQ 
ID NO: 5) sequence elements in probed regions with low (open boxes) and high (filled 
20 boxes) levels of GAF binding. Horizontal lines represent the 10th, 25th, 50th (median), 75th 
and 90th percentiles, p-vahies are according to the Mann-Whitney U-test 

Fig; 6A and Fig. 6B depict the. mapping of DmSir2-l target loci. Fig. 6A 
provides a chromosomal m^ of Cy3:Cy5 ratios (average of two e3q)eriments) for 
chromosomes 2, 3 and 4, and the X chromosome. Fig. 6B depicts dispersed repetitive 
25 elements (mostty transposons). Some genes of particular interest or wifli high levels of 
VI)mSir2-l are labeled. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

The terais ^^polynucleotide" and **nucleic acid" refer to a polymer composed 
of a multiplicity of nucleotide units (ribonucleotide or deoxyribonucleotide or related 
structural variants) linked via phosphodiester bonds. A polynucleotide or nucleic acid can 
be of substantially any length, typically from about six (6) nucleotides to about 10^ 
nucleotides or larger. Polynucleotides and nucleic acids include RNA, cDNA, genomic 



30 
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embodiment of the present invention a polynucleotide encoding Escherichia coli DNA 
adenine methyltransferase was used as the tethered nucleotide modification enzyme. 

Once the nucleotide modification enzyme has been directed to the chromatin 
binding site by the chromatin protein, the nucleotide modification enzyme can modify 

5 nucleotides ofttiechromatm in the region ofthebind[ing site. These modifications of the 
nucleotides can be detected by various methods including immunochemistry, Southran blot, 
PGR, and various types of macro- and micro-anays. The binding lod of the chromatiti 
protein can be identified by detemiining the location of flie detected nucleotide modifications 
within the chromatin. In a specific embodiment, the loci or tibe chromatin proteins 

1 0 heterochromatin binding protein 1 , GAGA factor and I>rosophila DmSir2-l gene was 
detemiined by immunocytochemistry. 

The methods of the present invention also provide methods for large scale 
mapping of loci of chromatin proteins. The methods can be used to obtain detailed genome- 
wide maps of the binding patterns of chromatin proteins io, for example, cell populations 

15 grown in culture, tissues, or in cells isolated bom an entire multicellular organism. The 
chromatin profiles can provide infonnatioii into the fimctions and mechanisms of action of 
chroinatin proteim on an individual cellular basis, at the tissue Ict and the organism level. 
In a particular embodiment pairwise com^jaiison of profiles of different chromatin proteins 
in the same cell type can be used to determine fimctional interactions (or lack thereof) 

20 between these jyroteins. At the level of an organism, the profiles can be used to compare the 
profileis between different organisms or between different states (eg., developmental stages) 
of an organism. 

The present invention fiirther provides methodis for producing a profile of 
chroinatin protein loci for a cell population of interest which metiiod comprises; transfecting 

25 the cell population with a plurality of expression vectors capable of expressing a plurality of 
different chromatin protein-nucleotide iriodification enzyme fusion proteins, each egression 
vector comprising a nucleic add encoding a low efficiency promoter operatively associated 
with a nucleic acid encoding the different chroinatin proteins and a nucleic acid encoding a 
nucleotide modification enzyme; culturing the transfected cells for a period of time sufficient 

30 for expression of and binding of each of the plurality of chroinatin protein-nucleotide 

modification enzyme fusion polypeptides; and detecting the loci for each of the nucleotide 
modifications within the chromatin of the cell populatioiL The profile of chromatin protein 
loci for the ceU population is determined fiom the location of the DNA modifications. 



BNSOOaO: <WO__01688a7A2.L> 



(12)INlXRNATI0NALAPPLICATlONPUBUSHEDUND^ 



(19) Wortd Intellectual Property Organization 
Intmational Bureau 

(43) International Publication Date 
20 September 2001 (20.09^001) 




PCT 



(10) Inteniational Publication Number 

WO 01/68807 A2 



(51) InternatioDal Patent Classification': C12N 

(21) International AppUcation Number: PCT/USOl/08590 

(22) International Filing Date: 16 March 2001 (16.03:2001) 

(25) FiUng Language: ^g}hh 

(26) PubUcation Language: English 



(30) Priority Data: 

60^90362 
60/ 



16 March 2000 (16.03^000) US 
1 March 2001 (01 .03:2001) US 



(71) Applicant (for all designated States except US): FRED 
HUTCHINSON CANCER RESEARCH CENTER 
[USAJS]; Office of Technology Transfer, 1100 Fairview 
Avenue North, M/S: C2M 027, Seattle, WA 98109-1024 
CUS). 



(72) Inventors; and 

(75) Inventors/Applicants (for US only): VAN STEENSEL, 
Bas [NUNL]; Nieuwegiachtje 1-3, NL-1011 VP Amster- 
dam (NL). HENIKOFF, Steven [USAJSl; 4711 51st Place 
SW,SeatUe,WA981l6(US)- 

(74) Agents: POOR, Brian, W.etal.;Townsend and Townsend 
and Crew LLP. Two Embaicadeio Center, 8th Floor. San 
Francisco. CA 94111(US). 

(SI) Designated States (national): AU, CA. JP. US. 

(84) Designated States (regional): European patent (AT. BE. 
CH. CY, DE, DK, ES, H, FR, GB, GR, IE. TT, LU. MC. 
NL.PT.SE.TR)- 

Published: . , l ^ 

— without international search report and to be republished 
upon receipt of that report 

[Continued on next page] 



m (54) Title: IDE^m^CAT10N OF ^TKO DNA BINDING UXJI OF CHROMATIN PRO^ 
= CLEOnDEMODinCATION ENZYME 



< 
o 

00 
00 



B 



703 

■ I I L- 



MM 
2477 

J_ 



t too 

!- 

• SI 



^ tSB a03 2S77 UMa^ B»V1 




torn UUtA *nwr (ttP) ^ i 



(57) Abstract: A novel technique is provided, designated DanlD. for the 
identification of DNA loci that interact in vivo with specific nuclear proiems 
in eukaryotes- By tethering a DNA modification enzyme, in particular. £ 
coli DNA adenine methyl transferase (Dam), to a chromatin protein. The 
DNA modification enzyme (Dam) can be targeted in vivo to the native bind- 
ing loci of the protein, resulting in local DNA modification. Sites of DNA 
modification can subsequenUy be m^ped using modification-specific re- 
striction enzymes, antibodies, or DNA array methods. DNA Modification 
Identification (DamID) has potential for genom&-wide mappmg of m vtvo 
target binding sites of chiorhatin proteins in various eukaryotes. 
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mENXmCATION OF J2V WFO DNA BDWING LOa OF CmOlV^ 
PROTEINS USING A TETHERED NUCliEOTIDE MODIFICATION ENZYME 

RELATED APPLICATIONS 
5 This application is a contiiniation in part of United States patent application 

Serial number 60 / , filed March 1, 2001, and a continuation in part of United 

States patent ^plication 60/190,362, filed March 16, 2Q0P, the disclosures of which are 
incorporated herein by reference in their entirety. 

10 BACKGROUND OF TBDB INVENTION 

Chromatin is the highly complex stmcture consisting of DNA and himdreds 
of directly and indirectly associated proteins. Most chromatin proteins exert their regulatory 
and structural functions by binding to specific chromosomal loci. Knowledge of the nature 

15 ofthe 271 vh'o target lod is essential for the understanding of the fimctions and 

of action, of chromatin proteins. Interactions between protein complexes and DNA are at the 
heart of essential cellular processes such as transcription, DNA replication, chromosome 
segregation, and genome maintenance. High-resolution, genome-wide m^s of binding sites 
of these proteins can provide a valuable resource for researchers studying chromosome 

20 organization, chromatin structure, and gene regulation, but such comprehensive maps are 
cmrently unavailable. Therefore, techniques are needed to identify DNA loci that interact in 
vivo with specific proteins. 

At present, only a few techniques are available to localize the genomic loci 
recognized by DNA binding proteins (reviewed in Simpson, Curr, Opiru Genet. Dev. 9:225- 

25 229 (1 999)). In situ cross-linking methods followed by the immunoprecipitation purification 
of protein-DNA conq)lexes have been used to test the interaction of individual chromosomal 
loci with a particular chromatin protein (Solomon et al.. Cell 53:937-947 (1988); Law et al., 
Nucleic Acids Res. 26:91$.924 (1988); Orlando et al., Methods 1 1 :205-214 (1997); Kuo and 
. AUis, Methods 19:425-433 (1999); Blat et al.. Cell 98:249-259 (1999); Orlando, Trends 

30 Biochenu Sou 25:99-104 (2000)). These previously disclosed techniques have the inherent 
risk of arti&cts induced by tiie cross-linking reagent, and highly specific antibodies against 
each protein of interest are required, as well as relatively large numbers of cells. A 
modification of tiiis approach was recentiy en^)loy6d to identify binding sites of cohesins 
along a complete chromosome in yea^ (Blat and Kleckner, Cell 98:249-259 (2000). 
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